-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use PosixFileStream for files on POSIX #1855
base: main
Are you sure you want to change the base?
Conversation
|
||
// a hack but it does improve perfomance a lot if flushing is not needed | ||
_writeNeedsFlush = writeStream switch { | ||
FileStream => true, // FileStream is buffered by default, used on Windows and mostly Mono |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory the FileStream
should already be unbuffered since the FileIO.OpenFile
helper uses a buffer size of 1
. I'm not sure if it's true in practice, but it looks like at least .NET (Core) has code to handle it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just checked the source code of .NET Framework 4.8 and it also performs unbuffered writes when buffer size is 1. I suspect it is the same on 4.6.2. On Mono, however, buffer bypass only happens if the data length is > buffer size, meaning that single-byte writes still land in the 1-byte buffer, not on the disk. We could then optimize flushing by only flushing if IsMono and data length is 1.
But FileIO.OpenFile
is not the only way of creating FileIO
. One can use os.open
to get a file descriptor and then open a file. Currently, PythonNT.open
, when creating FileStream
, uses buffer of 4K (DefaultBufferSize
), which is defined since the initial commit 11 years ago. I will change it to 1 to get the consistent unbuffered behaviour, unless you have counter arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also open with a buffer size of 0 if that would help with Mono...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Mono, .NET, and .NET Framework all throw for buffer size <= 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, in the FileStreamOptions.BufferSize
docs it says 0
or 1
to disable buffering. I was assuming it'd simply pass along the value to the other constructor. Anyway, it looks like .NET Framework does indeed throw for <= 0 so I guess it doesn't help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, .NET doesn't throw with 0, I was speaking from memory, must have gotten confused with another framework. Or maybe it was throwing in the past (older framework versions).
Anyway, the issue is with Mono, and it does throw at 0, and it does buffer at 1.
_writeNeedsFlush = writeStream switch { | ||
FileStream => true, // FileStream is buffered by default, used on Windows and mostly Mono | ||
PosixFileStream => false, // PosixFileStream is unbuffered, used on .NET/Posix | ||
UnixStream => false, // UnixStream is unbuffered, used sometimes on Mono |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is causing the test failures since it'll try to load the Mono.Posix
assembly (which isn't packaged along with Windows builds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it makes sense... The failure on Linux is a bit more interesting as it is failing when dealing with a file opened in the "a" mode for simultaneous logging.
a5e3975
to
64d0d4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you changed it to draft. Did you have additional changes you wanted to do?
@@ -15,6 +15,8 @@ | |||
using System.Numerics; | |||
using System.Runtime.CompilerServices; | |||
using System.Runtime.InteropServices; | |||
using System.Runtime.Serialization; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
System.Runtime.Serialization
is VS trying to be helpful?
|
It was a non-trivial merge. |
This PR replaces
FileStream
withPosixFileStream
on .NET/POSIX (not Mono), which offers unbuffered access to the file though its actual file descriptor. This addresses the issue #1846 on .NET/POSIX.I decided to write own implementation of the stream, rather than encapsulating
UnixStream
. Not only the implementation had similar amount of work involved and complexity as a proxy class toUnixStream
, but also it allowed me to make different implementation choices thanUnixStream
, some of which I would call problematic. The best example isUnixStream.Close()
, which retries syscall toclose
if interrupted. This will result in error EBADF at best, and close an unrelated file descriptor at worst. It's because, on most systems (including Linux and macOS),close
always closes the descriptor, even it the call fails with an error, and retry is not appropriate. See CPython source code.Another effect of this PR is that for .NET/POSIX, there are no more "double streams", meaning that operations on file descriptors behave as expected in all cases (modulo bugs and missing pieces).
The performance of this implementation is roughly the same as CPython's. When I first tested it, I shockingly discovered that it was 2.5 times slower, but then noticed that
StreamBox
always callsFlush
after each write and that was slowing things down significantly. I included a hack to avoid it forPosixFileStream
and it helped.Module
mmap
is due for some cleanup, which is outside of the scope of this PR.Unfortunately I had to leave Mono behind, that is, on
FileStream
. Between the limitations/bugs ofMemoryMappedFile
interface andFileStream
, usingPosixFileStream
on Mono (or evenUnixStream
) would create a regression inmmap
.This PR also makes sure that all file descriptors created via
open
orio.open
on .NET/POSIX are non-inheritable, in line with #1225.